Exploring Translation Similarities for Building a Better Sentence Aligner
نویسندگان
چکیده
The approaches previously used for sentence alignment (sentence length, word correspondence and cognate matching) take into account different aspects of similarity between the source and the target language sentences. In this paper we discuss various aspects of similarity in translated texts that can be used for sentence alignment. We then describe a customizable method for combining several approaches that can exploit these aspects of similarity. This method also includes a novel way of using sentence length for alignment. Finally, the results of evaluation for this composite method (overall and at various stages) are presented. These results are compared with those for some previous approaches.
منابع مشابه
Acquis Communautaire Sentence Alignment using Support Vector Machines
Sentence alignment is a task that requires not only accuracy, as possible errors can affect further processing, but also requires small computation resources and to be language pair independent. Although many implementations do not use translation equivalents because they are dependent on the language pair, this feature is a requirement for the accuracy increase. The paper presents a hybrid sen...
متن کاملCorpus Aligner (CorAl) Evaluation on English-Croatian Parallel Corpora
An increasing demand for new language resources of recent EU members and accessing countries has in turn initiated the development of different language tools and resources, such as alignment tools and corresponding translation memories for new languages pairs. The primary goal of this paper is to provide a description of a free sentence alignment tool CorAl (Corpus Aligner), developed at the F...
متن کاملCLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units
Currently available alignment tools and procedures for marking-up alignments overlook non-contiguous multiword units for being too complex within the bounds of the proposed alignment methodologies. This paper presents the CLUE-Aligner (Cross-Language Unit Elicitation Aligner), a web alignment tool designed for manual annotation of pairs of paraphrastic and translation units, representing both c...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملParallel Corpora based Translation Resources Extraction
This paper describes NATools, a toolkit to process, analyze and extract translation resources from Parallel Corpora. It includes tools like a sentence-aligner, a probabilistic translation dictionaries extractor, word-aligner, a corpus server, a set of tools to query corpora and dictionaries, as well as a set of tools to extract bilingual resources.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007